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Chapter 1 



Introduction 



1.1 Motivation 

Texturing is an important technique to add fine details to geometrical surfaces. Due 
to the limited size of texture memory on current graphic cards, artists have to work 
within those limits. Hence, they have to work with a fixed texture budget and reuse 
this limited set quite often throughout a scene. Unfortunately this introduces visually 
disturbing patterns. Once these patterns are spotted, they remind the viewer of see- 
ing something artificial. To avoid this from happening, artists traditionally employ a 
technique called multi texturing, where multiple textures are blended together for the 
purpose of making the scene less repetitive. Although this technique worked quite 
well so far, there exist some downsides: Firstly it is computationally expensive to use 
multiple texture look-ups per fragment, because it stresses the memory throughput, 
which is often the main bottleneck. Secondly it is not a very intuitive approach and 
thus impacts the workflow of artists negatively. 

Virtual Texturing is a technique that can be used to texture every spot in the scene 
uniquely. It supports virtually unlimited detail by dividing the texture intro a hierarchy 
of smaller parts, so called pages, and keeping only those parts in the memory that are 
currently needed. Because of that, it is possible to provide areas in the texture exclu- 
sively for every surface, so that they really can be textured in a unique fashion, even 
if that would exceed the video cards memory. This approach suits the work of artists, 
because they can concentrate on the details in every area individually to provide the 
anticipated looks without worrying about the texture budget. 

Since Virtual Texturing can only keep a limited set of texture parts in the memory, it 
has to progressively load new parts as soon as they are needed. For example, changes 
to the viewpoint can cause pages to be become visible that are not currently available 
in the memory. As long as they stay unavailable, the viewer will see visual artifacts, 
which can be very disturbing. The obvious way to reduce these artifacts is to optimize 
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the streaming itself. But this does not work in cases where the latency is high, or the 
bandwidth is limited, such as internet streaming. In these cases page misses will be 
inevitable, hence it is important to decide in what order required pages are streamed, 
to minimize the artifacts and optimize the visual experience for the viewer. 

1.2 Contributions of this Thesis 

This thesis deals from a practical point of view with the implementation of a renderer 
that employs Virtual Texturing in order to use textures of such large sizes that every 
spot of a scene can be textured individually. 

Furthermore we introduce a accompanying tool chain, which allows its user to create 
textures of multiple gigabytes and to process existing data in a way such that they can 
be used with the generated textures. 

From a more analytical viewpoint we investigated several ideas that aim to improve 
the visual quality as soon as latency is inevitable. This includes an analysis of several 
heuristics that give an indication on which pages have to be streamed next and a tech- 
nique that tries to predict the future need. In addition to that we developed a method 
that can be utilized to measure the performance of these techniques. Provided with 
the results of this study, we present a combination of these ideas yielding better results 
compared to basic heuristics. 

1.3 Organization of the Thesis 

The present thesis is divided into 5 chapters. During the course of the second chapter 
we shortly review related works in this field. The third chapter gives an overview over 
the basic concepts behind Virtual Texturing, so that we can discuss several details of 
our rendering system and its accompanying tool chain throughout the rest of the chap- 
ter. In chapter four we present and discuss our results of the previously mentioned 
analysis. The last chapter closes the thesis with a conclusion and a look on some sub- 
jects that could be worthwhile to investigate in the future. 

Furthermore there is an appendix added at the end to make the overall thesis more 
readable and give the reader the chance to review certain mathematical formulas, that 
we employ within our discussion. 
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Related work 



The related work section is split into two sections. In the first part we review various 
publications on Virtual Texturing itself. But before that, we also discuss two older ap- 
proaches that use huge amounts of texture data for the purpose of rendering interactive 
scenes and hence are directly related to the topic of this thesis. During the second sec- 
tion we review existing literature on Sparse Voxel Octrees, a technique that extends the 
fundamental idea of Virtual Texturing to the visualization of voxels in order to provide 
a rendering system with unique and highly detailed geometry. 



2.1 Virtual Texturing 

Tanner et al. HTM J9 811 described the idea of a Clipmap, which was an early attempt to 
emulate the residence of a single large texture within the limited memory space. It is 
basically a mipmap pyramid in which each level is clipped to a certain region that is 
defined by a so called clipcenter. The clipcenter is calculated based on the viewpoint, 
so that only the visible parts of the texture data have to be stored in memory. Each 
chpcenter represents a specific region within its mip level. Due to this fact it works 
weU in the case of a terrain, whose parametrization exploits this property. But in the 
case of an arbitrary textured geometry it can happen that parts become visible that are 
not necessarily near to each other within the texture. This means, that in this case the 
chpcenter would cover a large region that contains unused texture parts. 

In 2004 Darbon et al. [ILDN04II proposed in a technical report a system that can be 
employed to use huge amounts of texture data on arbitrary textured meshes. In con- 
trast to using one large texture it works with a set of several texture files that were used 
to texture different meshes. For each mesh a so called Texture Load Map is used to 
identify texture tiles that are visible from the current viewpoint. In order to do so it 
renders the processed geometry into texture space. The output of this rendering pro- 
cess is a texture whose size matches the number of tiles within the texture. This means 
that if a texel in the texture has been rendered then the corresponding tile is needed. 
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For each tile that is not currently available in the memory a load request is sent to a 
Texture Producer, which has access to all the texture files on the hard drive. As soon as 
the requested tiles become available, the system will update a corresponding Tile Pool, 
which can be used in combination with a dynamic indirection table to render a mesh. 

Using a single large texture to render arbitrary textured meshes in an efficient man- 
ner was pioneered by id Software with a technique called Megatexturing. It has been 
used to texture the landscape environment of their commercial product Enemy Terri- 
tory: Quake Wars. For their next generation engine id Tech 5, which is used in the 
upcoming game Rage, id Software extended the idea further to uniquely texture every 
object in the game, not only the terrain. While implementation details are not publicly 
available, van Waveren addressed some of the issues of this technique in 2009 [|vW09ll . 
Sean Barrett gave a presentation about his implementation of Virtual Texturing at the 
game developers conference 2008 HBarOSl . It features a detailed description of his 
fragment shader that can be used to identify the needed texture parts within the screen 
space. Furthermore it describes how to render the currently visible part of the scene 
with just the available set of texture parts. 

Mittring [|MG0 8 1 discussed the advantages and disadvantages of several solutions for 
different Virtual Texturing aspects like rendering, content creation and data streaming. 
Instead of describing a complete solution, the report is a collection of several ideas that 
have been investigated by Crytek GmbH. 

2.2 Sparse Voxel Octrees 

Sparse Voxel Octrees is a technique that replaces the currently used triangle rasteriza- 
tion pipeline by employing raycasting in combination with an octree of voxels. The 
octree represents a geometry set that, like the texture data in case of Virtual Texturing, 
is far too large to be contained completely within the memory. It can be considered 
as an extension to Virtual Texturing, since it uses the same underlying principles and 
suffers from similar problems, because its visual quality also depends on how fast the 
data can be streamed from the backing storage. Due to this fact the results of this thesis 
could also be relevant for future research on Sparse Voxel Octrees. 
In 2008 Jon Olick described the idea of Sparse Voxel Octrees and the possibilities it 
provides for the game development process [|Oli08L In the same year Crassin et al. 
proposed their implementation called Interactive GigaVoxels [ICNL08II . 
In 2010 Laine and Karras published a technical report on their Sparse Voxel Octree 
implementation P LKIOH . It contains a discussion on several drawbacks of using voxels 
instead of triangles and features a analysis of the underlying memory usage. 
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3.1 Basic concepts 



Before we delve into the details, of our implementation in Section 3.2 , let us start with 
a discussion on basic concepts behind Virtual Texturing and introduce different key- 
words that we will encounter quite often in the rest of this chapter. 

The basic idea of Virtual Texturing stems from the Design of Operating Systems, 
where a similar memory management technique called Paging is used to give a run- 
ning process the impression of having one large address space continuously available 
in the random access memory, although its parts may be scattered within the memory 
incoherently. They may not even be available currently and hence have to be loaded as 
soon as they are needed. 

Virtual Texturing transfers this idea to the process of texturing for the reason of pro- 
viding a rendering system with a texture that is far too large to fit into the available 
memory of the graphics hardware. It emulates the residence of the large texture by 
actively managing a small working set within the sparse texture memory. 



We can see an abstract overview of this idea in Figure [3T[ The shown system streams 
parts of a large texture set, called Virtual Texture, into the faster but also smaller graph- 
ics memory, so that it can use just the available set to render the scene in the best pos- 
sible fashion. With best possible fashion, we refer to the fact, that in order to guarantee 
interactive framerates, the system has to render the scene although some of the needed 
parts are still not available in the sparse memory, called Page Cache. 



3.1.1 Virtual Texture 

The Virtual Texture, whose parts are progressively loaded by the rendering system, 
resides on a slower backing storage (e.g. a hard disc drive or an internet server) that 
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Figure 3.1: High level overview of a Virtual Texturing system. The system streams 
the pages of a Virtual Texture into a much smaller page cache, which is then used in 
combination with a page table to render the scene with the currently available set. 

has enough capacity to store it completely. It consists of a very large texture that is 
accompanied by all its mipmaps. The size of the highest resolution is chosen to be 
quadratic and a power of two (e.g. 32768 x 32768 Pixel). Every mipmap has exactly 
double the size in width and height as its next lower level of detail. Doing so ensures 
that all mipmaps within the Virtual Texture conform to the same properties as the 
highest resolution. 

We enumerate every mip level, starting with at the lowest resolution in ascending 
order. 

3.1.1.1 Pages 

All the mipmaps of a Virtual Texture are divided up into equal sized parts in order to 
make the task of selecting and managing the set of texture parts as easy and efficient 
as possible. The size of each mipmap is, because of the assumptions we made above, 
quadratic and a power of two. Due to this reason we choose a page size Wpage^hpage 
that exhibits the same properties (e.g. 128 x 128 Pixel), so that we get a discrete 
number of pages within every level of detail. We define the number nt of pages within 
mipmap i to be 



m 



T-T 



This automatically implies that the dimension of mipmap coincides with the cho- 
sen page size and that there is exactly one page within this level of detail. Furthermore 
it guarantees that the number of pages will increase by a factor of 4 for every mip level. 



For the reason of identifying each page, we can use both, a relative- and an abso- 



lute enumeration scheme. The example in Figure 3.2 shows a absolute identification 
system. In the relative scheme, we would identify for example page 6 as page 1 on 
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Figure 3.2: All mipmaps of a Virtual Texture are split into equal sized parts, called 
pages. Starting with one page on the lowest mipmap, the number of pages within a 
mipmap will increase by a factor of 4 for every mip level. 

mip level 2. By knowing the number of pages within every mipmap we can transform 
the page index from the relative into the absolute scheme quite easily. Let p'^^j denote 
the relative index of a page within mipmap m. Then the absolute index pabs will be 



m— 1 

E 

(=0 



;^«^. = pr./+I«. = p«/+I(2'-2' 



ra— 1 

E' 

/=0 



For the relative enumeration scheme we can furthermore split the page index into two 
coordinates p"\p'y that can be calculated by 



p-=p-, mod 2'^ 
For the other direction we can use 



P7el 



Prel — Px ^Py ^ 



3.1.1.2 Page Hierarchy 

By splitting all mipmaps in the way we described above, we get a nice property that 
can be exploited during rendering in order to have a fallback in the case that a needed 
page is not currently available in the cache. 

Looking at Figure [T2l reveals, that there exists a relation between the pages of suc- 
ceeding mipmaps. Every page within a mipmap can be connected with exactly four 
pages of the next higher level of detail. Doing so for all the mipmaps will provide a 



quadtree like the one shown in Figure 3.3 Pages that are connected by an edge are 



called parents and children. All the pages that are on a path between a page and the 
root of the quadtree are called ancestors of the considered page. 
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Figure 3.3: Page Hierarchy for three mipmaps of a Virtual Texture. Every node, except 
a leaf, is connected to exactly those four pages within the next mipmap, which contain 
the same texture information just in a higher resolution. 

As stated above a system can exploit this relationship during the course of render- 
ing in order to have a fallback. This actually means, that it can use the lower resolution 
quality of any ancestor to texture the scene, while it waits for the needed page to be 
loaded. 

3.1.2 Page Cache 

Since Virtual Textures are in most cases too large to be contained completely within the 
texture memory of the graphics card, a rendering system has to keep only those pages 
in the memory that are currently needed. This active management of texture data takes 
place on a reserved area within the graphics cards memory called Page Cache. Like the 
mipmaps of a Virtual Texture, the cache is split into equal sized parts, called frames. 
These frames have exactly the same size as the pages, so that each page of the texture 
fits into any frame and could simply be exchanged by another page. 



All frames together form a grid as shown in Figure 3.4 (a). We identify each frame 



within the grid by using a pair of size-independent coordinates /^ and fy. Since every 
frame has exactly the same size as every page, we can calculate the off set Xo/f set ^Joff set 
of a frame within the cache by 



^offset = fx-W 



page 



y off set — fy ■ hpage 



with w 



page — '--page 



hpage due to the quadratic nature of the pages. 



3.1.3 Virtual- & Physical coordinates 

All the geometry that is processed by a rendering system will use texture coordinates 
that have been created as if the complete texture on its highest resolution could be 
available. We call these coordinates virtual coordinates, since they stay in contrast to 
the reality that is dictated by the current state of the page cache. Due to the reason 
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Figure 3.4: (a) Every frame in the cache is identified by two size independent coordi- 
nates fx^fy The highlighted frame for example is identified as fx = 2,/^ = 0. (b) A 
page can reside at any possible position within the page cache, so we need different 
texture coordinates in order to sample the texture data correctly. 

that a page can reside at any possible position within the page cache, we need, as 



shown in Figure 3.4 (b), different coordinates to sample from the cache. Obviously 



a rendering system has to translate the given virtual texture coordinates into these so 
called physical coordinates or otherwise will fail to render the scene correctly. 

3.1.4 Page Table 

In order to transform virtual coordinates into physical ones, the rendering system needs 
to know in which frame a specific page resides or which ancestor could be used as a 
fallback. For the reason of keeping track of these informations, the system maintains a 
so called Page Table. It provides an entry for each page, which can be retrieved during 
rendering in order to get the coordinates of the frame from which it can sample the 
appropriate texture data. An example for the data that is stored within the page table is 



shown in Figure 3.5 



Since the page cache is modified often, it is necessary to update this table in an ef- 
ficient manner. 



3.2 Renderer 

After we described the basic concepts of Virtual Texturing in a more high level style, 
we are now ready to take a closer look at specific details of our rendering system. 



Our rendering system is part of a small engine we developed and which runs on Mac 
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Figure 3.5: Example for the data stored within the page table. If a page is available 
(green) then its entry within the page table contains the coordinates of the frame in 
which the page resides. Otherwise (red) it contains the coordinates of the frame that 
keeps the best available fallback. 

OS X and Linux with OpenGL as a Graphics API. Since we can not discuss the com- 
plete engine in every detail, we will concentrate on implementation details that are 
relevant for Virtual Texturing. 

3.2.1 Overview 



Figure 3.6 shows a schematic overview of our Virtual Texturing implementation. Let 
Rendering Thread 



Streaming Thread 




Figure 3.6: Schematic overview of our Renderer. The system is divided into a 
rendering- and a streaming thread. Both communicate over two datasets. While the 
streaming thread is simply used to load the needed pages from the backing storage, the 
rendering thread fulfills three different tasks: management, rendering and analysis. 
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us discuss the complete system in a high level fashion, before we delve into the specific 
subparts individually. 

Our system is divided into two parts: rendering and streaming. Both are implemented 
as separated threads for the reason of preventing the rendering part of getting stalled 
by waiting for unavailable pages to be loaded. 

The rendering thread is based on the ideas that have been proposed by Sean Bar- 
rett HBarOSH and consists of three subparts: 

1. Management Inserts newly streamed pages into the page cache and updates 
the page table accordingly. Furthermore an indirection table is generated, which 
represents a compact version of the page table that can be accessed by the shader 
during rendering. 

2. Rendering A shader that identifies the set of needed pages and renders the scene 
with the help of the indirection table and those texture parts that are currently 
available in the page cache. It outputs a standard color image that will be shown 
to the user and a second render target, called Needbujfer, which contains the 
indices of the needed pages. 

3. Analysis Analyzes the needbuffer to collect informations on the pages that have 
been marked as needed. These informations are further used to calculate priori- 
ties that determine the order in which the pages will be streamed. 

The streaming thread on the other side simply consists of a streaming loop that has 
access to the Virtual Texture on the hard disc. Its sole task is to load the pages that 
have been identified as needed by the rendering part in the appropriate order. 

The threads communicate through two datasets that are available to both of them: 

1. Priority Queue Represents the list of pages that should be streamed. Its order is 
based on the priority values that are calculated for each page individually during 
the analysis part of the rendering thread. 

2. Streamed Pages After a page has been loaded by the streaming thread, it will be 
stored in a list within the main memory. As soon as the rendering thread reaches 
the cache update, its management subpart will take all pages within the list and 
insert them into the page cache. 

3.2.2 Management 

Before each frame is rendered we check for newly streamed pages within the streamed- 
Pages dataset. If this is the case, we modify the page cache and update the page table 
in order to keep track of the current cache status. 

11 
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3.2.2.1 Modifying the page cache 

The page cache is simply a two dimensional rgb texture that can be accessed by the 
shader during rendering. For the color depth we use 8 bit per channel, since it coin- 
cides with the color data stored in our Virtual Texture. 



We divide and organize the cache as discussed in Section 3.1.2 But in contrast to 
the theory explained there we use a frame size that is slightly larger than the one we 
use to divide the Virtual Texture into pages. Why we exactly have to do this will be 



explained in Section 3.2.3.4 and 3.3.1.1 



Updating the cache is very simple. For each new page that is available, we find the next 
least recently used frame within the cache and copy the texture data with the help of 
the OpenGL function glTexSublmagelD into the respective position within the texture. 

3.2.2.2 Updating the page table 

We use a quadtree for the representation of the page table, in order to exploit the page 



hierarchy, which has been described in Section 3.1.1.2 It is basically an array that 
provides exactly one entry for each page of the Virtual Texture. Every entry, except 
page 0, has a reference to its parent page, so that we can follow the references to get 
data that belongs to the ancestors. 

In order to keep track of the current cache status, we store in each page entry the 
coordinates fx.fy of the frame that contains the page itself or the best available an- 
cestor. As soon as something in the cache has been modified, we have to update the 
page table accordingly. The above mentioned description of our quadtree has the nice 
property, that we can do this by just looping over the array that represents it. 



for (page = 0; page < MaxPageEntry ; ++page) 

{ 

p = pageTable [page ] ; 

if(p is not in cache and p has a parent) 

modify p to use the frame coordinates stored in parent 
} 



3.2.2.3 Generating the indirection table 

Since we can not use our page table as it is within the shader, we generate a so called 
indirection table. This is simply a two dimensional texture, in which each page is 
represented by a texel. We generate and upload it to the graphics cards memory directly 
after the page table has been updated as described above. For each page we retrieve 
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the frame coordinates that are stored in the page table entry and encode them into the 
color channels of the appropriate texel. Furthermore we store the mip level m of the 
page that resides within the respective frame. 

r = fx g = fy b = m 

The shader can later use this information to transform the virtual coordinates into phys- 
ical ones. 

3.2.3 Rendering 

A very important part of our implementation is the shader. Its tasks are 

1 . Identifying the currently needed set of pages 

2. Rendering the scene with the set that is available in the page cache 

Both tasks are achieved in the screen space for every fragment that gets rasterized. We 
started out with two shaders, so that each was handled separately. Doing so ended up 
being redundant, since we have to know which page is needed in order to retrieve the 
information that can be used to render the scene with the available set. So we switched 
to a unified shader with the help of multiple rendering targets. 

The fragment shader has the following data as inputs available 

1. The virtual texture coordinates s,t E [0, 1] that have been interpolated across the 
face. 

2. The texture that represents the indirection table. 

3. The texture that represents the page cache. 

3.2.3.1 Identification of needed pages 

The process of identifying the needed page for a fragment can be split into two steps 



(see Figure 3.7) 



1. Calculation of the mip level 

2. Calculation of the page index within the mip level 

We calculate the mip level m of the considered fragment by employing the edge- 



compression method, which is described in Appendix A. 2 



We know from Section 3.1.1.1 that the number of pages within mipmap m is 2™ • T". 
This means, that we have 2"' pages per row and 2"^ pages per column. By scaling s and 
t accordingly, we can get the relative page coordinates p'^ and p'^ quite easily. 

p'^ = floor {s-T") p'^ = floor{t-2'") 
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3.2.3.2 Needbuffer bit depth 

After we identified the page, we can store p'",p'^ and m into the color channels of the 
needbuffer. How this is done in detail depends on the bit depth we are using, since it 
impacts the maximal number of pages we can encode. 




(a) 



(b) 



Figure 3.7: (a) Calculation of the mip level for every fragment (b) Identifying the page 
within the respective mip level 



32 Bit Per Channel Using 32 Bit per color channel allows us to encode more pages 
than needed in most cases. So we simply store the data as it is. 

;? = P" b 



p7 



m 



Although the output encoding stays easy, we have to keep in mind, that we later read 
the needbuffer into the random access memory, in order to analyze it. Using such a 
high resolution is quite inefficient and will slow down the performance. 



8 Bit Per Channel With 8 Bit per channel we can only encode 256 symbols per 
color. This means, that we could only address 256 ■ 256 = 65536 pages per mipmap, 
if we would use the same encoding scheme as in the case of 32 Bit. This equates to an 
upper bound of textures that have a size of 32k x 32k. So in order to use textures of 
larger sizes, we exploit the usage of a alpha channel. 



r = p'^ mod 256 g = p'^ mod 256 
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256 256 

By doing so, we can encode 256-^ = 16777216 pages per mipmap. This means that we 
actually could represent textures of the size 524k x 524k. 



3.2.3.3 Rendering with the available set 

In order to sample the appropriate texel from the page cache, we have to transform the 



virtual coordinates s and t into physical ones, which as stated in Section 3.1.3 point to 
the appropriate texture data. To do so we have to answer the following questions: 
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1 . Which cache frame contains the needed texture data (external offset)? 

2. Which position within the frame should be sampled (internal offset)? 

Both questions can be answered by using the data we stored in the indirection table. 



So we use the page index we calculated in Section 3.2.3.1 to retrieve fx^fy and mip 



level / from the texel that represents the needed page within the indirection table. 
The external offset Xe^je is simply the start position of the respective frame within 



the cache. We can calculate it from f^ and fy as described in Section 3.1.2 



Calculating the internal offset Xi.ji is a little trickier, since we have to take the mip 



level of the available page into account. Figure 3.8 (d) shows that using the same in 



ternal offset within different mipmaps would yield incorrect results. We know from 



Section 3.2.3.1 how we can calculate the page within a mip level. When we do the 
same for the available mip level i and use the fract operation instead of floor, we get 
the correct result 

Xi = fract{s-2') yt = fract{t -2') 

Provided with both the external- and the internal offset we can use s' = Xe + xi and 
t' = ye + yi to sample the appropriate texel from the page cache. 

3.2.3.4 Filtering 

Bilinear We tried two ways to incorporate bilinear filtering. We started with a man- 
ually implementation within our shader, which ended up being prohibitively slow. The 
better way to get bilinear filtering in a shader is to use the implementation that is pro- 
vided by the chosen Rendering API, so that the interpolation of the texel data is done 
automatically during sampling. But doing so is not without its problems. Looking 



at Figure 3.9 (a) shows that it can lead to incorrect interpolation results and hence to 
visual seams when a texel is sampled that lies at the border of two neighbouring pages 
within the cache. We can circumvent this problem, when we provide a small pixel 
border around every page within the cache. This border contains texel data from the 
neighbouring pages within the Virtual Texture, so that we can sample the data as if 
the complete Virtual Texture would be available. Due to this fact, we have to store 
slightly larger pages within the the page cache. This is no contradiction to the theory 



explained in Section 3.1.1.1 because we still operate logically on pages of the chosen 
page size and just use the border as a workaround. How we get this border is described 
in Section [33iril 

Trilinear Barrett HBarOSH proposes an implementation of trilinear filtering, that re- 
lies on two caches, so that it can make use of the Rendering APIs implementation. 
Mittring HMGOSII states, that this approach is really complicated and a waste of mem- 
ory, since it stores identical texture data twice. We decided to use the obvious route 
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Figure 3.8: (a) virtual coordinates s,t (b) we make a page table look up to get the 
external offset (green) and calculate the internal offset (blue) (c) doing so for every 
fragment will yield the final image (d) Example on why the internal offset depends on 
the mip level: We want to sample from a page within mipmap 2, but the needed texture 
data is only available as a page from mip level 1 . Using the same internal offset would 
yield incorrect results. 
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Figure 3.9: (a) The problem of bilinear filtering. Sampling texels at the border of two 
pages within the page cache can lead to a interpolation of quite different colors and 
hence introduce visual seams at page borders within the rendered image, (b) Visual 
edges at the border between two neighbouring mip levels. Triiinear filtering diminishes 
this effect. 



that can be manually implemented within the shader. Since we have bilinear filtering 
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already available, we can be sure, that every time we sample from the cache, we will 
get a correctly bilinear filtered result. So in order to use trilinear filtering, we simply 
sample for two adjacent mipmaps and use a linear interpolation to mix both results. 



Using trilinear filtering leads to visually better results, as shown in Figure 3.9 (b), 
since it diminishes visible edges at the border of neighbouring mip levels. 

3.2.4 Analysis 

After the scene has been rendered, we update the screen and analyze the needbuffer 
in order to to find out which page needs to be loaded and how important it is for the 
currently visible scene. 

3.2.4.1 Evaluating the needbuffer 

In order to analyze the needbuffer, we read the corresponding image data from the 
graphics card into the main memory. This is the moment in which it can become in- 
efficient to use 32 Bit per color channel, due to the fact that a lot more data has to be 
downloaded from the texture memory. 

As soon as a copy of the needbuffer resides in the working memory, we start to loop 
over all the pixels. For each pixel we transform the extracted relative page coordinates 
p'^.Py^ into the absolute index p^hs and mark the respective page within the page table 
as needed. Furthermore we collect different informations, that can be classified into 
two sets 

1. Global information like cache hits and misses. 

2. Local information regarding the currently considered page, like the number of 
pixels that need it. 

While the first set can be used to track the performance of our system during runtime, 
we store the page specific informations within the appropriate page table entries. 

After the complete needbuffer has been analyzed we employ a page priority heuristic 
for the purpose of measuring the importance of each needed page. 

3.2.4.2 Page Priority Heuristic 

We implemented a set of different heuristics that calculate a priority for each needed 
page based on the information we gain within the analysis. The assigned priority of a 
page determines its position within the priority queue, in order to ensure that the most 
important pages get streamed first. Since we stored the page specific informations 
within their page table entries, we can further use the heuristics in combination with 
exploited page hierarchy to estimate the priority of the ancestor pages, if we choose to 
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implicitly stream them as well. A more detailed description of the used heuristics will 
be given in Chapter [4~2l, where we will discuss their impact on the rendering quality. 



3.2.5 Streaming 

Although it would be the starting point for practical performance optimizations, it was 
out of scope to tweak the streaming thread and hence we stayed with a very simple im- 
plementation for our system. Furthermore we did not incorporate any form of realtime 
texture decompression methods flvW06ll . which certainly would influence the design 
of the streaming part. 

Our streaming thread is simply a loop that accesses the Virtual Texture on the hard 
disc. Every cycle run it fetches the next needed page with the highest priority from the 
priority queue. The fetched texture data is then stored in the list called streamed pages, 
so that the management part of the rendering thread can update the cache as soon as 
possible. How we exactly access the Virtual Texture on the hard disc and extract the 
needed texture data depends on the file format we chose. 

3.2.5.1 Virtual Texture file format 

Instead of using a traditional image layout for the resulting texture, we designed a sim- 
ple file format that seemed more practical for the task of streaming the pages. As you 
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Figure 3.10: All pages of all mipmaps are stored in a sequential order. 



can see in Figure [3TT0 we store the pages of all mipmaps in a sequential order inside the 
same file. This approach is beneficial, since every page represents a standalone unity 
within the file and could be accompanied by a magnitude of other useful information 
that could be exploited during rendering. Another nice property is the simplicity, with 
which we can reach every page within the file. 



Let b denote the number of bytes per pixel. By knowing the absolute index pahs of 
the page we want to read, we can simply calculate the offset o within the file as 

O ^ Sizejilgligader ~r y^ page ' l^page ' t>)Pabs 

In contrast to a traditional image layout, where we would have to extract multiple pixel 
rows, we end up using fewer file operations in order to read a page. 
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3.3 Tool chain 



During the course of this thesis we had to develop a set of purpose-built tools, because 
existing image- and geometry processing software cannot operate on texture sizes of 
multiple gigabytes. Due to the fact that the development of an interactive modeling 
application could not be accomplished in the scope of this thesis, we concentrated on 
a solution, that allows us to compile the output of traditional tools into a dataset, that 



can be used within the rendering system we discussed in Section 3.2 



3.3.1 Texture Creation : vtmtc 

To create textures of the anticipated sizes, we developed a tool called vtmtc. It has 
been designed to construct the resulting texture in a series of smaller steps and do as 
much preprocessing as needed for the purpose of simplifying the rendering system. 



layout description 



unique textures 




Virtual Texture 



Figure 3.1 1: The vtmtc pipeline for creating Virtual Textures 



As you can see in Figure 3.11 everything starts out with a set of unique images and 



a layout that contains an exact position for each of them. Provided with this infor- 
mation, vtmtc creates a traditional two dimensional texture that represents the highest 
resolution mipmap of the Virtual Texture. This resulting texture is then scaled down to 
the next smaller mipmap size by employing bilinear filtering. We repeat this step with 
every mipmap until we reach the one that matches exactly the size of one page. 
The last processing step is the construction of a file that conforms to the format we 
discussed in Section |3.2.5.1[ We do so by cutting out one page at a time from the 



created set of mipmaps and storing them sequential in the filestream. 



3.3.1.1 Page-Border Creation 



As we discussed in Section 3.2.3.4 a pixel border around the pages is needed in order 
to prevent the rendering APIs implementation of bilinear filtering from filtering across 
the borders of neighbouring pages within the page cache. Looking again at the file 
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format we introduced in Section 3.2.5.1 it seems obvious to add these borders during 
the creation of the texture and simply save pages of a slightly greater size so that 
they also contain a small amount of the surrounding texel data that belongs to their 
neighbouring pages. 

We get the borders more or less for free, when we simply resize the rectangle that 
determines the area that will be cut out of a mipmap. 



3.3.1.2 Noise Value Calculation 

During our investigation on page priority heuristics, which will be discussed in Sec- 
tion |4.2[ we had the idea to provide the rendering system with a metric that gives an 
indication on how the quality will increase, if it uses the child page instead of its parent 
during rendering. 



The page hierarchy in Section 3.1.1.2 shows that every child represents exactly one 
fourth of its parent page in a higher resolution. As a consequence of this we can only 
guarantee meaningful results if we compare the currently considered child with the 
correct quarter. 



As shown in Figure 3.12 we achieve this by determining the correct part within the par- 
ent and have it scaled up to the size of one page. This upsampled quarter and the child 
are then transformed into luminance in order to calculate the rooted mean squard er- 
ror (see Appendix I A. 1.1 1, which we chose as the metric to estimate the quality impact. 
Due to the fact that the rendering system needs these, as we call them, NoiseValues to 




RSME = 20.5896 

Figure 3.12: Example for estimating the quality improvement the system would gain, 
if it would use the child instead of its parent page. 

decide which page will be streamed next, we can not provide them alongside the page 
data within our texture file format. So an extra file is generated that can be read out 
during the initialization of the Virtual Texture. 
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3.3.2 Geometry retexturing : vtgeo 

Due to the large number of projects that have been textured in a more traditional sense, 
it seems worthwhile to find a method that can automatically transform the texture coor- 
dinates of the provided geometry in a way that each of its faces can be textured unique. 
One of our tools, called vtgeo does just that: it takes multiple already textured poly- 
meshes and embeds them into one of our Virtual Textures and provides an individual 
area for each face. 

Although a little bit outdated, we chose the bsp file format from Quake 3 Arena as a 
source for complex geometry, because there exists a plethora of freely available maps 



on the internet. A schematic overview on vtgeo is shown in Figure 3.13 The starting 



source textures 





EC 


meshfiles 


\ r 




*■ 


Vtgeo 



duplicated textures 



unique textures 





layout description 



Figure 3.13: Overview of the vtgeo pipeline. 

point is a set of meshfiles that are textured by images from a given pool of source 
textures. During runtime vtgeo will process each mesh one by one. For each given 
polymesh it loops through the set of faces and fetches the texture that the triangles of 
the face are textured with. Since we want to texture every face uniquely, the texture is 
duplicated and saved on the hard disc. These copies can be modified with traditional 
image processing tools in order to make each face look unique. 

Special care has to be taken of multiple texture repetitions across a face. The bsp 
file format for example allows texture coordinates that are not contained in the range 
of [0, 1]. Quake 3 Arena uses these coordinates during rendering by simply repeating 
the texture in order to sample the appropriate texel. The problem is, that such a repe- 
tition does not make sense in the case of Virtual Texturing, where we want to texture 
every part of a face in a unique fashion. We avoided this problem by calculating the 
number of repetitions across a face and simply copying the source texture multiple 
times into the face specific duplicate and normalizing the texture coordinates of the 
respective vertices accordingly. 

While it fetches and duplicates the textures, vtgeo will create a layout file that can 
later be used with vtmtc in order to compile a Virtual Texture from the pool of unique 
textures. We will now describe how such a layout is generated and in which way the 
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geometry has to be modified, so that it can be used with a texture that stems from such 
a layout. 

3.3.2.1 Layout Creation & Geometry Embedding 

After the system generated copies of all the referenced source textures, it has to find a 
layout for the highest resolution mipmap so that each face is provided with an individ- 
ual area. In order to determine these individual areas, it starts out with an empty layout 
that represents the texture to be created. The layout consists of a two dimensional grid 
of entries, which all have exactly the size of one page within the texture. 



For each face vtgeo estimates, as shown in Figure 3.14 (a), the number of grid entries 
that the corresponding unique texture will fill horizontally and vertically. By knowing 
this size, measured in entries, a first fit algorithm is employed to find the next free 
area that is big enough to contain the texture. If there is no appropriate area available, 
vtgeo resizes the layout so that it represents a texture of the next higher resolution. 
After vtgeo found an area for each face in the geometry, the layout is ready to be used 
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Figure 3.14: (a) Every grid entry has the size of one page. We estimate the number 
of entries a unique texture will fill in the layout, (b) We employ a first fit algorithm to 
find the next free space that is big enough to contain the unique texture. 

for creating the Virtual Texture. But in order to use the texture in combination with 
the given meshes, a further processing task has to be done: transforming the texture 
coordinates of the vertices so that they coincide with the areas within the texture. 

With the starting point of the area within the destination texture and the sizes of both, 
the individual- and the destination texture, we can transform the texture coordinates. 
Let X and y denote the starting point of the area, while the width and height of the 
source- and destination textures are represented by wuhi and Wd^hj. Let us further 
assume that these values are measured in pixels and so can be outside of [0, 1] . We can 
then transform the texture coordinates sJE [0, 1] of a vertex by using 
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3.3.2.2 Problems of this approach 

Although the taken approach worked quite well for our needs, we identified two prob- 
lems, that can occur. 
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Figure 3.15: Problems of our approach, (a) We waste space by copying the complete 
texture, although the face needs just a small area, (b) Using textures that are not a 
multiple of the page size will lead to unused areas within the layout. 



No coherence Faces that are close to each other within the geometry can be placed 
at totally different places within the texture layout. Because of this lack of coherence, 
we can end up wasting time on loading more pages than required in the optimal case. 

Wasted space Our approach can waste quite a lot of texture space by simply insert- 
ing texture data that will never be used during rendering. We identified two possible 



situations that can lead to this and visualized them in Figure 3.15 



1. Some faces may reference only a small part of the texture. Since our algorithm 
simply duplicates the original image, it can happen that it reserves far too much 
space for a single face. 

2. As described above, the algorithm tries to construct a layout by using a grid 
of entries in which each cell has exactly the size of one page. If textures are 
used, that have a width or height that is not a multiple of this page size, we end 
up wasting space, due to the fact that an complete entry is marked as needed, 
although it is not completely filled. 

Since this is a common and hard problem in graphics, and since our approach worked 
sufficiently well, we did not invest more time into optimizing the algorithm used for 
layout generation. 
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3.4 Summary & Result 

In the course of this chapter we discussed our implementation of a renderer and the 
accompanying tool chain. As a result, we are able to take existing geometry and texture 
every of its faces uniquely. These textures are compiled into a Virtual Texture that can 
be used by our rendering system. Figure |3.16 shows some example screenshots of 
a map that was modified to contain unique features. The remaining problem of our 
overall texturing process is, that due to the fact that our tool chain is non-interactive, 
we can not exactly say which copied texture belongs to which face in the geometry. 




Figure 3.16: A Quake 3 Arena map whose unique textures were modified to show 
paintings by George Grosz. The pentagram textures show that we can really give each 
instance a unique look. 
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During the implementation we came across several problems that seemed worthwhile 
to investigate. 

Because of camera motions like rotations and translations, pages will become visi- 



ble that are not currently in the memory. Figure 4.1 shows an example of this. This 
effect becomes quite disturbing, especially on systems that suffer from high latencies, 
Uke internet applications or mobile devices, since we can load only a small number of 
pages per frame. Due to this reason, it seems obvious to collect information about the 
importance of pages for the purpose of selecting those that have to be streamed first. 



We investigated this idea and discuss some of our results in Section 4.2 



But even if we could improve the quality by choosing the pages cleverly, we will 
still experience an effect called LOD Snap llvW09ll . which is exemplified in Figure [43 



It basically means, that a low resolution page that is shown in the current frame, will 
be exchanged by its high resolution counterpart within the next one. During the imple- 
mentation it became obvious to us, that we can weaken the disturbance of this effect 
by streaming the ancestors of a page in advance. We tried several ideas and discuss 
them in Section |431 



we 



While we use a particular indoor scene to discuss the results during Section 4.2 
also made tests on terrain scenes. Due to the different nature of both scene types, we 
have to take their unique properties into account. We discuss some of these differences 



and their impact on the quality in Section 4.4 



4.1 Evaluation method 

To evaluate the performance of several system configurations, we started with tra- 
ditional metrics, like measuring cache hits. While useful in the case of operating 
systems, it became evident, that these can not be used in our case, since we get no 
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Figure 4.1: Camera motions like rotations will modify the current point of view. As 
a result pages will appear at the borders of the screen, which have not been visible 
until now. The system will use lower resolutions as a fallback, until the needed pages 
become available. 




Figure 4.2: Example of a so called LOD Snap. A low resolution page is exchanged by 
its high quality counterpart within the course of one frame. 

feedback on the visual quality. 



4.1.1 Quality measurement 

We settled on the idea to use quality assessments in order to see if one configuration 
leads to visually better results than another. The basic idea is to measure the difference 
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between a sequence of reference images that are rendered with the complete set of 
needed pages available, and one that is the result of using a specific system configu- 
ration. In order to get comparable results that were not aliased by the unpredictable 
behaviour of the streaming thread, we give every configuration a fixed limit of pages 
that it can load per frame. This is based on the assumption, that a good configuration 
will select the pages within this contingent more cleverly and thus will lead to better 
results. 

We started with rooted mean squared error (RMSE) as a quality indicator, but added 



Structural Similarity (SSIM) [|WBS"'"04| short time later, because it takes the human 
image perception into account and hence could lead to more interesting results. Both 
of these /m// reference methods are further discussed in Appendix ] A. 1 



4.1.2 Testcases 

We tested different configurations of our system in fly-through scenes of indoor levels 
and terrains. This seemed interesting, since we could analyze their performance in 
more or less practical situations, where motion and complex geometry are predomi- 
nant. 

In order to understand the behaviour and influence of ancestor streaming strategies, 
we do not have to take motion into account. Due to this fact we chose different spots 
within a scene and fixed the camera. Then we allowed the system to stream one page 
per frame, so that we could take a close look on the influence of each single page that 
has been loaded. 



4.2 Page Priority Heuristics 



As mentioned in Section 3.2.4.1 we analyze the needbuffer in order to get the set of 
needed pages and estimate which page would be the best to stream. The complete set 
is ordered in a priority queue, which sorts the pages based on their priority values. In 
order to calculate this priority for each page, we employ a set of different heuristics 
that base their decision on different informations. 

4.2.1 Discussed Dataset 

As an example during this discussion we will use a specific flythrough that we made 



in an indoor level. Figure 4.3 shows keyframes of the taken reference images in order 
to make the discussion more comprehensible. Two neighbouring keyframes represent 
a specific motion that is identified by the black number within the box between them. 
The upper red number represents the frame within our flythrough at which the motion 
starts. The lower red number on the other hand marks the resulting keyframe. If the 
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lower red number does not coincide with the upper red number of the next box, then 
the camera has not been moved in the mean time. 




Figure 4.3: Keyframes of the dataset we want to discuss. The black numbers in the 
boxes are used to refer to a specific motion, while the red ones identify starting- and 
ending frame of a motion. 



We will concentrate in our discussion on two specific sections separately. 

Motion 1 up to 6 will be used in the discussion of the LookAhead Camera. The 
interesting point is motion 1, which represents a very fast rotation that immediately 
stops after just one frame. We can interpret this as kind of a worst case, because we 
can not handle it properly at all. 2 + 3: After a short stop we slightly rotate around the 
y-axis and translate in z direction. 3 + 4 + 5: We rotate slowly around the y-axis. 6: 
We move to the wall that is now in front of the camera. 

Motion 7 up to 15 will be used to discuss the performance of several heuristics. It 
contains camera motions that are likely to appear in an real-world application. 7: We 
translate in the negative x direction along a wall that is in front of the camera. 8 + 9 + 
10: We rotate quite fast around the y-axis and very slowly around the x-axis in order 
to see the corridor. 11 + 12: We move in z-direction through the corridor. 13 + 14: We 
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rotate slowly around the y-axis. 15:We move to the ramp in front of the camera. 

Every configuration we tested had the same set of pages at the start within its cache: 
the first three mip levels and all the pages that are visible from the starting position of 
the flythrough (see first keyframe). 

4.2.2 Basic Heuristics 

We started out with a set of three basic heuristics. 



Random We assign each page a priority that is generated uniformly at random. 
Strictly speaking Random is no heuristic at all. It simply represents the case that no 
heuristic is used. 



PixelSum During the analysis of the needbuffer we count the pixels on screen that 
need the same page. A large number of pixel yield a higher priority. 

Distance Alongside the needbuffer we read out the depth-buffer and compute the 
mean distance for a visible page. A high priority is represented by a small distance. 



Figure 4.4 shows a comparison of the three heuristics within motion set 7 up to 15. 
Due to the fact that Random behaves badly compared to PixelSum and Distance we 
can conclude that streaming the pages in a non randomly order actually leads to better 
results and hence has an impact the visual quality. Furthermore we see that PixelSum 
yields the best results in a setup where only a small number of pages can be loaded and 
that this trend holds on if we allow every configuration to stream twice as much. Fig- 
ure |4.5| shows again the results of PixelSum, where we used two red boxes to highlight 
large peaks. These peaks represent huge quality losses that are caused by rotations like 
the motions 9 and 14. This means that it is highly relevant to diminish the effect of 
these rotations in order to improve the quality. 



4.2.3 Advanced Heuristics 

Since PixelSum yielded the best results we used it as a baseline for more advanced 
heuristics. 



4.2.3.1 Weighted PixelSum 

Building on PixelSum we had the idea to use a center within the screen space to weight 
the number of pixels. From this we derived the following two heuristics: 
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Figure 4.4: Comparison of the basic Heuristics within motion set 7 up to 15. Streaming 
pages in a specific order yields better results than selecting them randomly. PixelSum 
works pretty well. 

WeightedPixel This heuristic stems from the assumption, that the content at the cen- 
ter of the screen is more important than what is seen at the peripherals. The heuristic 
works quite similar to PixelSum, but each pixel that is accumulated gets weighted by 
its radial distance to the screen center. 



HotSpot The idea is to use the strength of WeightedPixel, but to get better results 
in the case of rotations. Rotations will cause pages to appear at specific sides of the 
screen, while other will leave it at the opposed ones. Due to this fact we take the rota- 
tion into account and shift the center based on the rotation to the borders of the screen. 



See Figure 4.6 for a schematic example of this idea. 



Figure 4.7 (a) shows the performance of WeightedPixel in comparison to PixelSum. 
Unfortunately WeightedPixel has not such a great impact on the quality as we hoped 
for, but it actually does a slightly better job in the case where the camera moves down 
the corridor (motion 11 & 12). This is due to the nature of the heuristic itself, since it 
gives pages at the screen center a higher priority. In such a motion through a corridor 
the pages at a high distance reside in the near of the screen center, while those on the 
wall on both sides of the camera span larger areas. WeightedPixel prefers those that 
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Figure 4.5: Results of PixelSum. The red boxes mark two large peaks that represent 
huge quality losses. These performance hits are caused by rotations. 



* — ^» 



Figure 4.6: We shift the weighting center based on the rotation that occurred. A ro- 
tation around the x-axis (blue) and a rotation around the y-axis (green) will cause the 
center to be shifted to its final position (red). 



reside at the center and so does exactly the right thing, because these pages will stay 
visible for a longer period of time. 



Figure 4.7 (b) on the other hand compares Hotspot with both WeightedPixel and Pix- 
elSum. As we assumed: HotSpot uses the strength of WeightedPixel in the case of the 
corridor, but does also a better job in the case of slow rotations (see motion 13 up to 
16). Unfortunately it suffers in the case of fast rotations from the same problem as all 
the other heuristics. 
We also made tests with a sliding HotSpot in which the amount of the shift depends 
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(a) 5 Pages per frame 



(b) 5 Pages per frame 



Figure 4.7: (a) WeightedPixel does not perform as good as we hoped for, but does a 
better job in the case of the translation through the corridor, (b) HotSpot exploits the 
strengths of WeightedPixel and does a better job in the case of slow rotations. 

on how fast the camera rotates. Although it seemed to be a more sophisticated idea, it 
did not lead to better results compared to WeightedPixel. 



Weighted Structural Similarity As we introduced WeightedPixel, we mentioned 
our assumption that the content at the screen center is more important than whats visi- 
ble at the peripherals. If the viewer looks most of the time at the center of the screen, 
then this assumption should be true. Unfortunately both SSIM and RMSE do not take 
this screen weighting directly into account. Thus we extended structural similarity to 
weight the results of each compared window based on the distance of the windows 



midpoint to the screen center. Figure 4.8 shows again a comparison of PixelSum and 
WeightedPixel, but this time based on the results of our extension WSSIM. Although 
the scale of the results changes, we get curves that are quite similar to those that are 
yielded by SSIM. WeightedPixel again does not lead to overwhelmingly better results. 



but both heuristics become more distinguishable. In contrast to Figure 4.7 (a) Pixel 



Sum performs with the exception of the fast rotation in motion 9 overall better. We 
mentioned this point for the reason of completeness. Although we employed SSIM 
during our study and use it throughout this discussion, it could be very valuable to use 
a quality assessment that directly takes the screen weighting into account. 
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140 160 

Frame 



Figure 4.8: Using WSSIM in order to take the importance of the screen center into 
account. The performance of PixelSum and WeightedPixel become a bit more distin- 
guishable. 



A common problem of the heuristics discussed so far remains: It can happen that they 
select pages that provide little or no improvement to those that are currently available. 
The next section features a technique that tries to overcome this problem by taking the 
content of the pages into account. 



4.2.3.2 NoiseValue 

A perfectly white wall will stay white, no matter how close the viewer gets. Even small 
amounts of noise will hardly be detectable and there will be little change between mip 
levels, whereas a very grainy rock texture will exhibit a lot of change on every mip 
level. In order to take this fact into account, we had the idea, as previously stated in 



Section 3.3.1.2 to provide the system with an indicator on how much the texture will 
change if it uses the child page instead of its parent during rendering. 



In order to test this idea we extended the aforementioned heuristics to consider this so 
called NoiseValue: We accumulate all the NoiseValues between the considered page 
and the currently available ancestor. This sum is then used to scale the priority. 

We constructed a level that is based on the example given above and in which the 
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Figure 4.9: We chose a source texture that is used by many pages and that spans large 
areas on different mip levels. We exchanged this texture so that the NoiseValues of the 
considered pages are zero. 



NoiseValue should be relevant: We selected one often used source texture that spans 
large areas within the so far discussed level and exchanged it with another that does 



not contain any noise. See Figure 4.9 for screenshots of this idea. 




WeightedPixel NoiseValue 
WeightedPixel Normal 



140 160 

Frame 
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Frame 



(a) 5 Pages prepared Level 



(b) 5 Pages prepared Level 



Figure 4.10: Considering the NoiseValue in the prepared level yields huge improve- 
ments. 
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Figure [OO (a) and (b) show the resuks of WeightedPixel and HotSpot within this pre- 
pared level. As we assumed: The overall quality increases, if we scale the computed 
priority by the accumulated NoiseValue. Furthermore we get much smaller peaks in 
the case of rotations, since it concentrates on the pages that actually result in a quality 
improvement. 
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Figure 4.1 1: Although not as strong as in the prepared level, we also get smaller Peaks 
in the case of rotations and better results overall. 



We also tested this idea in the original level, in order to see if the quality also in- 
creases. The results are shown in Figure |4.11 Although the performance does not 
increase as strong as in the case of the prepared level, the trend is the same: better 
results overall and smaller peaks in the case of rotations. 



4.2.4 Page Prediction 

The heuristics discussed so far suffer in their basic form from the fundamental problem 
that they have to make their decision solely by using the information that is given 
within the current frame. This essentially means that they can not predict the future 
need of pages. HotSpot is the only exception, since it indicates what pages are more 
likely to stay visible within the next frame. But even HotSpot can not say which pages 
will become visible. 
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4.2.4.1 LookAhead Camera 

One method to predict the future need of pages is the idea that we call a LookAhead 
Camera. It is based on the assumption that if we translated or rotated the camera in the 
last frame to a certain amount, we will probably do so again in the course of the next 
one. 



We track the motion that occurred during the current frame and build a second view 
matrix that simply represents the transformation as if the motion occurred twice. We 
then do a second render pass and use the resulting needbuffer during the analysis. A 
visualization we used for debugging purpose and that gives an example of the idea can 



be seen in Figure 4.12 




^^ ^^^^ 



(a) Rotation around y-Axes 



(b) Translation 



Figure 4.12: Visualization of the LookAhead Camera in our implementation. The red 
marked rectangle shows a scaled version of what can be seen from the viewpoint of the 
LookAhead Camera in the current frame. If the user translates or rotates the camera, 
the LookAhead Camera will show the possible result of the next frame. 



Figure 4.13 (a) and (b) show a comparison of different HotSpot configurations. We 
see clearly that the overall performance gets better and that we can successfully di- 
minish the effect of fast rotations, which was the main intent of using the LookAhead 
Camera. 



But using the LookAhead Camera is not without its problems. Figure 4.14 shows a 
peak that is the result of a very fast rotation, which immediately stops within the next 
frame. The LookAhead system is fooled by assuming that the next frame will feature a 
rotation of similar amount. So an immediate rotation of 45.0 degrees for example will 
yield a 90.0 degree rotation for the LookAhead Camera. This one frame that could 
have been used for streaming highly needed pages is actually lost. Figure 4.14 (a) 
and (b) show the results of this effect for HotSpot and HotSpot in combination with 
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HotSpot 



140 160 

Frame 



(a) 5 Pages per frame 



(b) 10 Pages per frame 



Figure 4.13: The LookAhead Camera leads to a overall better quality and smaller 
peaks in the case of rotations. 



the NoiseValue. Looking at Figure 4.15 reveals that this single frame is also lost in the 
case where the system is allowed to stream twice as much pages. 
We can conclude that it would probably be better to adjust the LookAhead Camera to 
coincide with the viewing matrix of the current frame in the case that an immediate 
rotation occurs. Using the second derivative of the rotation could be employed as an 
indicator for this adjustment. 



4.2.4.2 Prediction within the shader 

At the end of our study we had an idea for a prediction within the fragment shader 
that could be used during a translation on the z-axis in order to indicate if a page 
is likely to stay visible. The basic idea is based on the assumption that if we could 
estimate the rate of change for the mip level of a page then we could store this value 
in the needbuffer and later incorporate it into a heuristic. In contrast to the LookAhead 
Camera this technique does not require a second render pass. Unfortunately, we did 
not have enough time to implement it in the course of this thesis. Furthermore as we 
pondered about it, it became clear that fulfilling the task based on the information that 
is available in a fragment shader becomes non trivial. Due to the fact that it would only 
work for translations on the z-axis, one would have to weigh up costs and benefits. 
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Figure 4.14: Problem of the LookAhead Camera. A fast rotation, that immediatly 
stops, fools the LookAhead Camera and hence a frame that could have been used for 
streaming the correct pages is lost. 

4.2.5 Result 



During this section we investigated several ideas that can be used to select the needed 
pages in an order that represents their importance for the current frame. Furthermore 
we tried to predict the future need of pages in order to diminish the effect that is caused 
by motions, most importantly rotations. 
We started with a set of basic heuristics and selected the best of them as a baseline for 



further extensions. Figure 4.16 shows a concluding comparison of the basic heuristic 
PixelSum and a combination of those investigated techniques that yield very good 
results. While we achieve just little improvements over the basic heuristic in the case 
that we can stream many pages, we actually see that we can significantly do better in a 
setup that allows only a few number to be streamed. 

It should be clear that it is not really possible to reach with just 5 pages per frame the 
same quality as a system that has the possibility to stream twice as much, but we can get 
closer to it. Our combination, HotSpot NoiseValue with LookAhead, performs much 
better in the case of rotations and yields overall better results compared to PixelSum. 
This is due to the different natures of these techniques: While the LookAhead Camera 
allows the system to select pages that will be probably visible within the next frame, 
HotSpot gives an indication on which of these are likely to stay visible. This is very 
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Figure 4.15: The problem does also occur if the system is allowed to stream twice as 
much pages. 



important, since it filters those pages out that will probably become useless within the 
course of the next frames. Incorporating the NoiseValue into this combination narrows 
the set of useful pages even more, since it ignores those pages that will have only a 
small impact on the rendered image. 



4.3 Ancestor Streaming 

Despite of using a good page priority heuristic and employing some form of predic- 
tion, we still experience LOD Snaps when we stream only the pages that are really 
needed. As mentioned at the beginning of this chapter, we came to the conclusion 
that it makes sense to implicitly load the ancestors of the needed pages in advance to 
slowly fade into the higher resolution. As we investigated the problem, we started out 
with two strategies to stream the ancestors of a page. 

Let P and Q denote two pages, that have been identified as needed. This means, that 
they could be needed directly or passively by their children. In this case we could use 
one of the following strategies. 
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Figure 4.16: Although we can not reach with just 5 pages per frame the same quality as 
in the case of 10 pages, we achieve quite distinguishable results in a setup that allows 
only a small number to be streamed. The difference in (b) is not as great as in (a), but 
this only underlines the importance of using such techniques within application areas 
that suffer from high latencies. 

InternMipMapOrder We make sure, that we load page P before Q, if P is an an- 
cestor of Q. In the case that P is not an ancestor of Q, we simply let the priority decide 
on which page has to be streamed first. 



ExternMipMapOrder We make sure, that P is loaded before Q, if P is part of a 
lower resolution mipmap than Q. 

We define the priority of an ancestor page as the sum of the priorities of all its chil- 
dren. This makes sense, since it will yield a higher priority for ancestor pages that 
have multiple children with high priorities. In order to ensure the properties of the Ex- 
ternMipMapOrder strategy, we simply modify the comparison function of the priority 
queue to check the level of detail before it compares the priority of two pages. 



Figure 4.18 (a) shows how both strategies behave in an indoor level test, that conforms 



to the setup we mentioned in Section 4.1.2 If we would just decide based on how fast 
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(a) InternMipMapOrder 



(b) ExtemMipMapOrder 



Figure 4.17: InternMipMapOrder streams the ancestors not uniformly, which can be 
more disturbing on slower systems than loading all ancestors in a uniform fashion. 
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Figure 4.18: (a) Comparison of ExtemMipMapOrder and InternMipMapOrder. (b) 
NoiseValueSkipping reaches higher levels of quality faster. 



a strategy increases the quality measured by SSIM, we would choose InternMipMa- 
pOrder. Unfortunately we have to say, that this fast increase in quality is misleading. 



Figure 4.17 shows an example for why this is the case. InternMipMapOrder will lead 
to rendered images in which some parts of the scene stay on a low resolution for a long 
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time, while the rest of the scene is rendered in the anticipated quality. During our tests 
we had to admit that this result is as disturbing as the LOD Snaps, whose effect we 
wanted to diminish. ExternMipMapOrder in contrast increases the quality in a uniform 
fashion, since it ensures that all visible spots are on the same level of detail before it 
streams pages of the next higher mipmap. 




— Extern 4 MipMaps locked 

— Extern no MipMap locked 
Intern no MipMap locked 
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Figure 4.19: (a) Locking the first four mip levels makes sense in the case that a high 
page coherence can not be guaranteed, (b) We have to accept a overall lower quality 
in order to avoid LOD Snaps. 



4.3.1 A closer look on ExternMipMapOrder 

Since we identified IntemMipMapOrder to be no improvement over LOD Snaps, we 
took a closer look at the visually more satisfying ExternMipMapOrder. 



NoiseValueSkipping In Section 4.2.3.2 we used the so called NoiseValue to scale 



the page priority. Due to the nature of the NoiseValue we can incorporate this idea into 
ExternMipMapOrder to estimate if it is possible to skip the ancestor of a page. If the 
NoiseValue between a child and its parent page is lower than a certain threshold we 



can actually ignore the ancestor. We implemented this idea and Figure 4.18 (b) shows 
that we can reach higher levels of quality faster. We used the mean NoiseValue of all 
pages within the Virtual Texture as a threshold, but this could lead to false results if 
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the arithmetic mean is large. Trial and error could possibly lead to a better threshold 
that is independent from the input NoiseValues. 



Locking mipmaps Looking again at Figure 4.18 reveals that ExtemMipMapOrder 
loses much time in this example by loading the first 85 pages. The number 85 rep- 
resents the fact that each page within the first four mipmaps is implicitly or directly 



needed. We stated in Section 3.3.2.2 that a high page coherence can not be guaranteed 
by our tool chain. This means that pages from all over the Virtual Texture could be 
visible from the current viewpoint. If such a scene property is predominant, we advise 
to lock the first four mipmaps within the cache. This means that the system loads the 
respective pages once at the initialization and will never swap them out of the cache. 



Figure 4.19 (a) shows that doing so leads to the assumed result. 



Testing with motion We also tested ExtemMipMapOrder within the setup of Sec- 
tion |4.2[ Figure |4.19| (b) shows an comparison of our best configuration in which we 
can see the impact of incorporating ExtemMipMapOrder. Using ExtemMipMapOrder 
leads to slower quality improvements, which is quite natural since the tests we made 



in Section 42] did not take ancestor streaming into account. This means that we have 



to accept a overall lower quality in order to diminish the effect of LOD Snaps. 



4.4 Terrains - Page Coherence 

Although we focused on indoor scenes during our study, we also did some of the tests 
in Section l4!2l with terrains. 



We mentioned in Section 3.3.2.2 that we can not guarantee a page coherence between 
faces that are geometrical near to each other. While valid for the indoor scenes, the ter- 
rains we generated do not suffer from the missing page coherence, because we can sim- 
ply overlay the triangle network that represents them with a continuous area within the 
Virtual Texture. Another difference between these landscapes and the indoor scenes is 
the fact, that the viewer is not completely surrounded by the level geometry. Instead he 
sees the horizon that is either represented by a skybox that is so far away, that it stays 
constantly on low resolution mip levels, or simply not rendered at all. 

Our tests yielded more favourable results compared to the indoor levels. This can 
actually be explained by the predominant page coherence, which makes it possible to 
improve the quality of neighbouring areas by streaming a smaller set of pages. Fur- 
thermore terrains feature a wide viewing range that allows the user to see areas far 
away, so that the system can stream higher resolution fallbacks in advance. Although 
not as large as in the case of indoor scenes, the rendering of terrains suffers from the 
same problems as soon as rotations occur. 

During our tests we came across a problem that is commonplace in terrains scenes, 
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Figure 4.20: (a) The problem of small occlusions within the terrain gets more or less 
ignorable in the case that the system has the possibility to stream a large amount of 
pages, (b) The LookAhead Camera leads to non marginal improvements. 



while it does not often occur in case of indoor levels. Small obstacles occlude parts of 
the scene that become visible as soon as the viewer can look behind such a obstacle. 



These obstacles are small hills in the case of landscape scenes. Figure 4.20 shows the 
results of one flythrough we made. Without performing any rotations, we just trans- 
lated on the z-axis in order to fly over a sequence of small hills. Instead of a more 
or less constant quality we actually see small peaks that are caused by the quality de- 
crease that occurs as soon as areas that were occluded become visible. 



Figure 4.20 (a) shows that the impact of these occlusions becomes less relevant when 
the system can stream many pages per frame. We again tried the advanced techniques 



from Section 4.2 in order to weaken the loss in quality for the case of a system that 



can only stream a small number of pages. Figure 4.20 (b) and 4.21 reveal that only the 
LookAhead Camera has an non-marginal impact on the quality. 



4.5 Summary 

In this chapter we investigated several techniques with the goal of improving the visual 
quality by collecting a set of different information on the needed pages and have them 
streamed in a way that represents their importance for the current or next frame. In 
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Figure 4.21: No huge improvements by using HotSpot (in this case WeightedPixel, 
since we do not rotate the camera) or incorporating the NoiseValue. 

order to measure the performance of these ideas, we introduced a method that is based 
on the visual output. Provided with this method we were able to compare different 
system configurations and hence could conclude where specific techniques have a pos- 
itive impact on the quality. Furthermore we took a look at how we can diminish the 
visual disturbing LOD Snap effect and in which way the differences of indoor- and 
terrain scenes impact the quality. 



45 



Chapter 5 
Conclusion & Outlook 



During the course of this thesis we developed a renderer that employs Virtual Textur- 
ing in order to store large texture sets within the limited size of available memory. An 
accompanying tool chain has been implemented that allows the user to create textures 
of multiple gigabytes and automatically retexture existing geometry. Furthermore we 
analyzed several techniques that, if incorporated into the system, can improve the qual- 
ity of rendering. 

Looking at the ubiquitous usage of web-enabled mobile devices, we can imagine mo- 
bile applications that employ Virtual Texturing in order to stream a vast amount of 
texture data from an internet server and use it during rendering on those low spec ma- 
chines. Because of the limitations within this area of application, e.g. high latencies, 
we feel confident that our investigation can be used as a starting point to make the 
visual output for the user as good as possible. 

The Virtual Textures we used during the thesis were uncompressed and hence quite 
large. Although we have large storage media available today, it would be interesting 
to incorporate realtime decompression techniques [|vW06ll and analyze their impact on 
Virtual Texturing. 

Furthermore it would be worthwhile to delve more deeply into the development of 
interactive tools purposely built for Virtual Texturing. The tool chain presented in 
Section |33 was sufficient enough for the course of this thesis, but in order to use 



this technique to its full potential one will have to offer artists a set of more intuitive 
applications. 
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Appendix A 
Mathematical Background 



A.l Full reference quality assessment 

To estimate the quality of an image, one can employ different methods that are classi- 
fied by their dependency on reference images. Those of them that fall into the class of 
Full reference methods, measure the difference in quality by comparing the distorted 
image directly to another that is considered to be perfect. 

A.1.1 Mean Squared Error 

Mean squared error is a widely adopted statistic to estimate the quality of an image. 
Its strength is the simplicity and rapidity with which it can be computed. 

Let us assume, that we want to compare two images X and Y . Further let Xj and yt 
denote the colors of two corresponding pixels within the images. The mean squared 
error between X and Y is then defined as 



1^ 



,2 



MSE = -^[x,-yi 

and obviously 

RMSE = ^/MSE 
for the rooted mean squared error. 

A.1.2 Structural Similarity 

Using MSB and its variants for image quality estimation, suffers from the problem, 
that these metrics do not deliver results that are consistent with the image perception 
of humans [|Wan98ll . In order to overcome this problem, metrics hke the universal 
image quality index [IWB02II and its successor Structural Similarity [|WBS"'"04|| have 
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(a) MSE = 
SSIM=1.0 




(b) MSE = 144 
SSIM = 0.988 



(c) MSE = 144 
SSIM = 0.694 



(d) MSE = 142 
SSIM = 0.662 



Figure A.l: Example taken from [|WBS"'"04| that shows that SSIM classifies the im- 
ages more differentiated than MSE. (a) Original image (b) meanshift (c) blur (d) jpeg 
compression. 



been developed, which take the analysis of the Human Visual System into account. 



Figure A.l shows an example in which MSE would yield misleading results. 



Let /J, denote the mean intensity of a block of pixels i and a, the corresponding vari- 
ance. Furthermore let us assume, that o, j represents the covariance between two 
blocks i and j. Then we can compute the SSIM Value between two pixel areas x 
and y by 



Q{x,y) 



(A/2-fA'y+Cl)(o2-fo2-FC2) 



where Ci = {L- Ki)^ and C2 = {L- Ki)^ are included to avoid instability. 
L represents the range of color values, while ^1 and K2 are constants <^ 1 . 

To calculate the overall SSIM value between two complete images, a sliding win- 
dow approach is used. A window of a constant size (e.g. 8x8 Pixels) loops over 
both images completely and compares the corresponding pixel areas. Let m denote the 
number of areas that have been compared and Qt the SSIM value that resulted in the 
analysis of the i-th areas. Then the arithmetic mean of all SSIM values will yield the 
mean result between both images 

_ 1 V 

(Jmean — ~ ^ Hi 



m 



i=l 



A.2 Edge-compression based level of detail calculation 

In order to calculate the mip level of each fragment that is processed by our shader 
(see 3.2.3 1 we use the approach that has been proposed by Paul Heckbert [IHL83L The 
basic idea is to measure the compression that occurs, when a parallelogram within 
the texture gets mapped to the quadratic size of a fragment. This compression can be 
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estimated by the maximal length of the parallelograms edges. By knowing the partial 
differentials of the texture coordinates at the currently considered fragment 

, ds , dt , ds , dt 

'""^d^ ^'^dx 'y^dy ^y^dy 

we can compute the lengths of these edges. 



Choosing the maximal length 

provides us with the maximal compression. We scale e,nax ^ [0, 1] with the dimension 
of the maximal mip level, for the reason of getting a useful result. By employing the 
logarithm we get the needed level within the pyramid. 

d = log2 {emax " dinimax) 

The OpenGL shading language GLSL nKBR09ll provides everything we need to do a 

straight implementation of the described theory. 

Please note, that we return uMaxMipMap - d in our implementation, so that we can 



use the mip level enumeration we described in Section 3.1.1 



uniform float uMaxDim; 
uniform float uMaxMipMap ; 

float calculateMipMap 

{ 

vec2 dx = dFdx( gl_TexCoord [0] . St ) ; 
dx *= dx ; 

vec2 dy = dFdy ( gl_TexCoord [0] . st ) ; 
dy *= dy ; 

float e_max = sqrt (max(dx . s+dx . t , dy . s+dy . t )) ; 

float d = log2(e_max * uMaxDim); 

d = min ( d , uMaxMipMap ) ; 
d = max(d,0.0); 

return uMaxMipMap— d ; 
} 
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